Skip to content

Add inner_dtypes to NestedDtype for sub-column dtype casting #230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

hombit
Copy link
Collaborator

@hombit hombit commented Apr 5, 2025

It is the first step for multiple nesting implementation.

This is a draft PR, because I found the implementation to be over-complicated. Probably the better solution is using list-struct as the default data representation instead of this inner_dtypes system.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Apr 5, 2025

Codecov Report

Attention: Patch coverage is 98.83721% with 1 line in your changes missing coverage. Please review.

Project coverage is 98.93%. Comparing base (44e986d) to head (85cdd7e).
Report is 79 commits behind head on main.

Files with missing lines Patch % Lines
src/nested_pandas/series/ext_array.py 97.22% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #230      +/-   ##
==========================================
+ Coverage   98.24%   98.93%   +0.68%     
==========================================
  Files          14       14              
  Lines        1254     1315      +61     
==========================================
+ Hits         1232     1301      +69     
+ Misses         22       14       -8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

github-actions bot commented Apr 5, 2025

Before [cf7f4ce] <v0.3.8> After [bd10671] Ratio Benchmark (Parameter)
25.2±0.6ms 25.5±0.6ms 1.01 benchmarks.AssignSingleDfToNestedSeries.time_run
9.76±0.3ms 9.83±0.07ms 1.01 benchmarks.NestedFrameAddNested.time_run
270M 269M 1 benchmarks.AssignSingleDfToNestedSeries.peakmem_run
92M 92.1M 1 benchmarks.NestedFrameAddNested.peakmem_run
96.5M 96.8M 1 benchmarks.NestedFrameQuery.peakmem_run
95.8M 95.9M 1 benchmarks.NestedFrameReduce.peakmem_run
289M 288M 1 benchmarks.ReassignHalfOfNestedSeries.peakmem_run
10.8±0.2ms 10.7±0.03ms 0.99 benchmarks.NestedFrameQuery.time_run
1.25±0ms 1.24±0.02ms 0.99 benchmarks.NestedFrameReduce.time_run
40.3±7ms 39.7±8ms 0.99 benchmarks.ReassignHalfOfNestedSeries.time_run

Click here to view all benchmarks.

@hombit hombit requested a review from Copilot April 7, 2025 14:47
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

src/nested_pandas/series/ext_array.py:248

  • The variable 'infered_inner_dtypes' appears to be misspelled. Consider renaming it to 'inferred_inner_dtypes' for clarity.
pa_array, infered_inner_dtypes = cls._box_pa_array(scalars, pa_type=pa_type)

@hombit hombit requested a review from Copilot April 7, 2025 14:59
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (2)

src/nested_pandas/series/packer.py:246

  • Ensure that merging inner_dtypes with the '|' operator handles duplicate keys as intended so that any overlapping field types are combined correctly without unintentionally overriding useful type information.
dtype = NestedDtype(inferred_dtype.pyarrow_dtype, inner_dtypes=inner_dtypes | inferred_dtype.inner_dtypes)

src/nested_pandas/series/ext_array.py:692

  • When accumulating inner_dtypes from multiple items in _box_pa_array, verify that merging dtypes using update() does not lead to inconsistent or conflicting type information if the same field is encountered with different inferred types.
scalar, dtypes = cls._box_pa_scalar(v, pa_type=pa_type)

@hombit
Copy link
Collaborator Author

hombit commented Apr 22, 2025

This approach is way too complicated, started a new implementation at #242

@hombit hombit closed this Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant